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Abstract 

We consider the problem of lossy source coding with a mismatched distortion measure. That is, we investigate 
what distortion guarantees can be made with respect to distortion measure p, for a source code designed such that 
it achieves distortion less than D with respect to distortion measure p. We find a single-letter characterization of 
this mismatch distortion and study properties of this quantity. These results give insight into the robustness of lossy 
source coding with respect to modeling errors in the distortion measure. They also provide guidelines on how to 
choose a good tractable approximation of an intractable distortion measure. 
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I. Introduction 

A. Problem Formulation 

Given source alphabet X and a reconstruction alphabet y, a source {Xi}i>i with each Xi taking values 
in X, and two distortion measures p n , p n : X n x y n — > E + . Assume we have access to an oracle that, 
^ . when queried, produces a source code /„ (i.e., a mapping f n : X n — > y n ) such that 

Ep n (X n ,f n (X n ))<D. 

j> ■ What guarantees can we make a priori (i.e., before querying the oracle) about ~Ep n (X n , f n (X n ))1 As a 
^ . second question, assume we have access to an oracle that, when queried, produces a source code f n such 
$: thafl 

2; -\o g \f n (x n )\<R 
o ■ 

OO 

o 



E Pn (X n J n (X n ))<D. 



What guarantees can we make a priori about Ep n (X n , f n (X n ))7 

This problem has the following operational significance. Let a source code with expected distortion 
^ ■ according to p of at most D be given. Assume instead of using this source code with respect to p, we 
^ . decide to use it with respect to p. Such a situation occurs if constructing a source code for p is not 
feasible or if p is not fully known when constructing the source code. We are then faced with a mismatch 
in the distortion measure, and the best distortion guarantee mentioned in the opening paragraph provides 
a measure for how severe this mismatch is. 

As an example, for an image compression problem, p is determined by the human visual system, and 
any tractable model p of it can necessarily be only an approximation of it. To be more specific, assume p 
is taken to be squared error. While it is well known that this is not a faithful model for the human visual 
system, it is nevertheless often used in practice due to its simplicity. Assume then we choose one out of 
the many available source coding schemes for squared error distortion p. This source coding scheme will 
have some distortion guarantee for p (the distortion measure it is designed for). The best performance 
guarantee mentioned in the opening paragraph allows then to translate this distortion guarantee for p to a 
distortion guarantee for p. If, in addition, we also fix the rate of the source coding scheme, we are able 
to obtain a tighter performance guarantee (the second question in the opening paragraph). 

This work was supported in part by NSF under Grant No. CCF-0515109, and by HP through the MIT/HP Alliance. 
The authors are with the Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 
Cambridge, MA 02139, USA. Email: {uniesen, devavrat, gww}@mit .edu 
'| /„(,¥") | denotes the cardinality of the range of the function /„. 
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In other words, an answer to the above questions allows to analyze the robustness of coding schemes 
to modeling errors - or mismatch in general - in the distortion measure. 



B. Related Work 

The question of mismatched distortion measures in source coding has previously been considered 
in [1], [2], [3], [4], and [5]. In these works the mismatch is only with respect to the encoding part of the 
source code, whereas at least the decoder is matched to the proper distortion measure. This differs from 
the setup here, where the mismatch is with respect to both, the encoder and the decoder. We comment 
on the precise differences in the following paragraphs. 

In [1], a partial order among distortion measures is defined such that p > p if for every source code 
(consisting of an encoder g n : X n — > {1, . . . ,exp(ni?)} and a decoder (f> n : {1, . . . , exp_(n.R)} — > y n ) 
satisfying Ep n (X n , 4> n {g n {X n ))) < D there exists a second decoder n satisfying Ep n (X n , <p n (g n (X n ))) < 
D. Thus, in this setup, the encoder g n is designed for a mismatched distortion measure p, whereas the 
decoder <j) n is matched to the distortion measure p. 

In [2], the following problem is considered. Fix a codebook C C y n , and let g n : X n — > C be an 
optimal encoder for this codebook C with respect to p. Find codebook C and decoder n : C — > y n such 
that Ep n (X n , <j) n (g n (X n ))) is minimized. Again, the mismatch is only with respect to the encoder g n , 
whereas the decoder as well as the codebook C are matched to the distortion measure p. 

In [3], the author considers the problem of finding an encoder g n : X n — »■ {1, . . . , exp(nR)} such 
that there exists a decoder cp n : {1, . . . , exp(nR)} — > y n satisfying E,p n (X n , <p n (g n (X n ))) < D while 
maximizing infi E,p n (X n , <j) n (g n (X n ))). In other words, the goal is to find an encoder that guarantees 
distortion at most D with respect to p, while making sure that this code has maximum possible distortion 
with respect to p. As in the previous cases, the mismatch is only with respect to the encoder, whereas the 
decoder <p n is matched to the distortion measure p. 

In [4, Problem 2.2.14] and [5], the problem of lossy source coding with respect to a class of distortion 
measures is considered: Given a class of distortion measures T, we want to find a source code f n : X n — > 
y n such that sup per Ep n (X n , f n (X n )) is minimized. In other words, f n is now "matched" to all p £ T 
simultaneously. 



C. Modeling Perceptual Distortion Measures 

In this section, we briefly review the typical structure of perceptual distortion measures. This will mo- 
tivate the results presented in the main text. We focus here on distortion measures for image compression; 
the structure of perceptual distortion measures for speech, audio, or video compression is similar (see [6] 
for details on those distortion measures). The discussion here follows [7] and [8]. 

The typical structure of a perceptual distortion measure for image compression is depicted in Figure [Q 
Here x and y are the original and reconstructed image respectively, represented, for example, as vector 
of gray scale values. 
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Fig. 1. Typical structure of a perceptual distortion measure. Adapted from [7]. 
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The first block (termed front end) contains conversions from the image format to physical luminance 
observed by the human eye and other calibrations. The second block performs a linear transform of the 
two images, usually decomposing it into a number of spatial frequency bands with different orientations. 
In the next block, the coefficient of each band is weighted to account for masking effects. The resulting 
vector of weighted coefficients of the original and reconstructed image are then subtracted. The last block 
takes this vector of weighted differences and pools it together into one real number. Usually this is done 
by computing the i p norm of the difference vector for some p > 1 or taking some power r > 1 of that 
norm. Typical values of p range from 2 to 4. 

Formally, the source and reconstruction alphabets are X = y = R m or X = y = [0, l] m for some 
finite m. In the following, we write x, y for elements of general X, y, and we write x, y if we want to 
emphasize that X = y = R m or X = y = [0, l] m . This means that p is of the form 

p(x, y) = • • • , v(x m ))W x - • • • , v(y m )) W y \\ r p , 

and is sometimes simplified to 

p(x,y) = \\([v(x 1 ),...,v(x m )} - [v(yi),...,v(y m )])W at \\ r p . (1) 

v : R — > R accounts for the front end, W : R m — > R mxfe accounts for the linear transform and masking. 
Here (and in the following), we write for a E R fc and p > 1 



_((£tikl p ) 1/p ifp<oc 

|max!<j< fe | Oi| if p = oo 



I II A 

a 



D. Outline of Results 

We now discuss several questions that arise when trying to construct and use perceptual distortion 
measures for source coding. These questions motivate the results presented in this paper, and they are 
used as examples throughout. 

• The choice of r and p for the error pooling seems to vary quite considerably across different perceptual 
distortion measures for image compression. [9] uses p — 2,r — 1, [10] uses p = 2.4, r = 1, [11] 
uses p = 4, r = 1, and [12], [13] use p — 2, r = 2. It is therefore of interest to know how distortion 
mismatch in these two parameters affect the performance of the source code. This is discussed in 
Example [2] (using Theorems [U [H [3l HI). 

• Given a class of distortion measures T, [12] suggests the following approach to find the "best" 
approximation p E T to the distortion measure implemented by the human visual system: Simulate 
the (information theoretically) optimal encoding scheme for all p E T, and determine experimentally 
(i.e., by showing the original and distorted image to a human) the one yielding the smallest distortion. 
This optimal distortion measure is then declared to be the best approximation. While this approach 
yields indeed the best approximation p E T when used with the optimal infinite length source code, it 
is not clear a priori if this p will also yield a good approximation when used with a suboptimal source 
code. Indeed, as we shall see in Example [21 there are situations in which the mismatch for the optimal 
and (even only slightly) suboptimal source codes are very different. In Example [3] (using Theorem [5]), 
we provide conditions on Y and the source under which the p found with this approach yields also 
a good approximation when used with good but not optimal source codes. These conditions hold 
for the model in [12] (with a few additional assumptions, that are implicitly made there). Hence our 
results provide evidence that the optimal approximation p E Y found in [12] will also be good for 
practical (and hence necessarily suboptimal) source codes. 

• [13] proposes a vector quantizer design procedure for distortion measures of the form 

p(x, y) = w x \\y - x\\l, (2) 
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where w : W l — > R. Since this is considerably simpler than the standard model (OQ), the question 
arises of how to find the w x such that the resulting p in © is "close" to one of the more complicated 
form (OQ)- Note that it is not immediately obvious what "close" should mean in this context. Indeed, 
there are several such notions that are reasonable. In Example [5] we show what properties such a 
notion should have. The problem posed by [13] discussed above is treated in detail in Example [T] 
(using Theorem [3]) and Example [6] (using Corollaries [8] and [9]). 
• Essentially all models of perceptual distortion measures contain a number of parameters that are 
usually chosen to be in "close agreement" with the behavior of the human visual system. Again, it is 
not clear what "close agreement" should mean here. In Example [7] (using Proposition [TOl) . a simple 
such measure of closeness is proposed, providing a guideline for how to tune the parameters of a 
perceptual distortion model to be used for source coding. 



E. Organization 

The remainder of this paper is organized as follows. In Section HH we present our main results. Section [TTTI 
contains the corresponding proofs. Section [IV] contains concluding remarks. 



II. Main Results 

In this section, we formally introduce the problem of source coding with distortion mismatch. To 
simplify the exposition, and since it represents the case of most practical interest, we assume in the 
following that X = y = M. m for some finite m. Most of the results are, however, also valid if the 
alphabets are general Polish spaces (i.e., complete, separable, metric spaces). We let B(X x y) be the 
Borel sets of X x y. By V(X x^),we denote the set of all probability measures on (X x y, B{X x^)). For 
Q E V(X xy), Qx denotes the X marginal of Q. For a measurable function g : X x y — > E, we denote 
by Eqp(X, Y) or Eq# the expectation of g(X, Y) with respect to Q. For any A E B(X x y), we write 
Eq(p; A) for KqqIa- I(Q) denotes mutual information (in nats) between the random variables (X, Y) ~ Q. 
Throughout this paper, we restrict attention to single-letter distortion measures, i.e., measurable functions 
p: X xy with p n : X n x y n -> R + defined by 



n 

Pn(* n ,2/") = -$>( 



n 

i=l 



n 



We also assume throughout that the source {Xi}i>i is i.i.d. with distribution P E V(X). R P (D) and 
D p (R) denote the rate-distortion and the distortion-rate function for the source {X;}j>! and with respect 
to the single-letter distortion measure p, i.e., 

R p (D) 4 inf I(Q), 

Qx=P,^QP<D 

D P (R) ^ inf E Q p. 

Qx=p,i(Q)<R 

Our results are divided into several parts. In Section III-AL we provide single-letter characterizations 
of the mismatch distortion. In Section III-BL we investigate properties of these quantities. Section III- CI 
contains information on how to evaluate the single-letter characterizations of the mismatch distortion. 
Section Hl-D[ considers the problem of finding a good representation of a distortion measure from a class 
of simpler ones. 
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A. Single-Letter Characterizations 

In this section, we provide single-letter characterizations of the smallest distortion with respect to p 
that can be guaranteed for any source code (either with or without constraint on the rate R) designed for 
distortion D p with respect to p. 

Define 

D p , p (R,D p )±supE Q p, (3) 

where the supremum is taken over all Q E V[X x y) such that Q x = P, Eqp < D p and I(Q) < R. If 
the set over which this supremum is taken is empty, we define D p p (R, D p ) = — oo. 

Theorem 1. Let p,p be distortion measures satisfying Epp(X,y ) < oo for some y$ G y. For every 
D p < oo such that 

0<D p < lim D Pi p(i? -5, D p - 5) 
there exists a sequence of source codes {f n }n>i such that 

lim i log | | <R, 

n— >oo 77, 

limsupE Pn (X",/ n (X")) < D p , 

n—>oo 

\immiEp n (X n J n (X n ))>D p . 

n— »oo 

Theorem 2. For any n and any source code f n : X n — > y n such that 

-\og\f n (X n )\=R, 
n 

Ep n (X n J n (X n ))<D p , 



we havJ^ 

Ep n (X n J n (X n )) < D p ,p(R+,Dp). 

If moreover, R > R p (D p ) then 

Ep n (X n J n (X n ))<D p , p (R,D p ). 

Theorems [T] and [2] allow us to make guarantees about the performance of a source code constructed 
with mismatched distortion measure. Indeed, if f n : X n — > y n is a source code of rate R designed for 
a distortion measure p and distortion level D p , then by Theorem [2l f n is also a source code for any 
distortion measure p and distortion level D p p (R+, D p ). Moreover, this is essentially the best guarantee 
one can make, since by Theorem Q] there exist source codes with same blocklength n and same rate R 
designed for distortion measure p and distortion level D p that result in a distortion level of more than 

D p> p(R-5(n),Dp-5(n))-5(n) 

for distortion measure p with 5(ri) — > as n — > oo. This answers the second question posed in the 
introduction. 

To answer the first question, we need to find the best distortion guarantee that is independent of the 
rate R of the source code. From Theorems Q] and [2l this best distortion guarantee is given by 

sup D Pj p(R, Dp). 

R>0 



2 For a real valued function g, we write g(x+) = limsio g{% + S) and g(x— ) = lim^jo g(x — 5), assuming the limits exist. 
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Since D p>p -(-,D p ) is an increasing function, this is equal to 



lim D p , p (R,D p ). 

R-^oa 



The next theorem considers this limit. 



Theorem 3. If 

(i) p, p are continuous 

(ii) there exists y G y such that K P p(X,y ) < oo 

(iii) Dp(oo) < D p < oo 

then for any rj > the expectation 



E P sup(p(X, y) - rjp(X, y)) 



is well defined and 

D P; p(oo, D p ) = min ( r]D p + E P sup (p(X, y) - r)p(X, y) 



V>0 V y£ y 

If, moreover, 
(iv) D Pi/5 (oo, Dp) < oo, 
then 

lim D Pj p(R,D p ) = Dp,p(oo,D p ). 

R— >oo 

Example 1. Let 

p(x, y) = (y- x) T W x {y - x), 
p(x, y) = (y- x) T W x {y - x), 

where W x and W x are positive definite for P almost every x. Let P G V(X) such that 

E P X T W X X < oo. 

With this, Assumption (i) and (ii) of Theorem [3] are satisfied. Applying the theorem yields that for 

Dp(oo) < Dp < oo, 

D^(oo ;J D p ) = mm V D p + E P sup (y - X) T (W x - V W x )(y - X), (4) 

and whenever this quantity is finite then also 

lim D P: p(R,D p ) = D Pi/5 (oo, D p ). 

R-^oo 

If W x — 7]W X in © is not negative semidefinite for some x, then it has at least one strictly positive 
eigenvalue v > with corresponding eigenvector v. Setting y = x — av yields 

(y - x) T (W x - rjW x )(y - x) = a 2 uv T v -> oo 

as a — > oo. Hence the rj minimizing (HJ) is always such that W x — r]W x is negative semidefinite for P 
almost every x. In this case 

sup (y - x) T (W x - r]W x )(y - x) = 0, 

and we obtain ^ 

lim D^(i?, £> p ) = D p inf {77 > : W x - r]W x <0P a.e.}, (5) 

R^oo 

where W x — rjW x < means that the matrix on the left hand side is negative semidefinite. 
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B. Properties of D PtP (R, D p ) 

The function D P!P -(R,D p ) exhibits the following behavior: 



{-00} 

D p>p -(R, D p ) e { R + U {±00} if R 

i + U {00} if R > R p (D p 



ifR< R P {D P ), 



Moreover, a simple argument shows that D p p (R, D p ) is concave and increasing in both its arguments, 
and continuous at all points (R,D p ) such that R > R p (D p ). D P p(R, D p ) is necessarily discontinuous at 
(Rp(D p ), Dp), but could be either left- or right-continuous (as a function of either R or D p ). This implies 
that the function either equals 00 for all (R, D p ) such that R > R P (D P ) or is finite on this whole range. 
The two types of possible behaviors of D p p (R,D p ) are depicted in Figure [2] 
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Fig. 2. Possible behaviors of D Pt p(R, D p ). 

The next two theorems describe the behavior of D p p (R, D p ) in more detail. Theorem @] provides 
conditions under which D p j,(R, D p ) = 00 for all (R,D p ) such that R > R p (D p ). In these situations, 
we cannot make any guarantees about the performance of a source code of rate R designed for distortion 
measure p and distortion level D p when used for distortion measure p. Theorem [5] gives sufficient 
conditions such that D P p{R,D p ) > for (R,D p ) with R = R p (D p ), and conditions for D PtP -(R, D p ) 
to be right-continuous in R at those points. 

Theorem 4. If 

(i) < R < 00 

(ii) Dp > D P {R) 

(iii) there exists yo G y such that Kpp(X,y ) = D < 00 

(iv) there exist {A k } k >i C B(X), {yl}k>i C y such that 



E P (p(X,yl);A k )<oc 
P(A k ) inf p(x,yl) -> 00 

x&A k 

sup p{x,y* k )/p{x,y* k ) -> 

x£A k 



for all k > 1, 
as k — > 00, 



as k 



00 



then D P)P (R, D p ) = 00. 

Remark. The second and third part of Assumption (iv) are satisfied for example if p(x, y) — ► 00 and 



p(x,y)/p(x,y) -> when \\y 



x 



00. See also Example El 



s 



Example 2. Let p(x, y) = d(y — x) r , p(x, y) = d(y — x) r for arbitrary norms d, d : M m — > R + , and 
for r,r > 1. Let P G V(X) be such that Epd(X) r < oo. With slight abuse of notation, we shall write 
p(x — y) for p(x, y) and similar for p in this example. 

Case 1: r < f. We first show that the conditions of Theorem @] are satisfied. Since all norms on a finite 
dimensional space are equivalent, there exist a\,a 2 > such that 

a±d(z) < d(z) < a 2 d(z) 

for all z G W n , and thus there exist b±, b 2 > such that 

bip(x - y) r,r < p(x - y) < b 2 p{x - y) r,r 

for all x G X, y G y. Hence, we have 

p(x - y)/p{x - y) < —p(x - yf r ~ r ^ r 

h 

for all x G X,y G J 7 - Let A = [-c, c] m , and choose c such that P(A) > 0. Set y* k = kl, where 
1 = (1, . . . , 1) G M. m . With this 

sup p(x - y* k )/p(x - y* k ) < sup — p(x - y* k ) {r ~ r)/r 

= max —d(x — yt) r ~ r — * 

as k — ► oo, satisfying Assumption (iv.3) of Theorem HI Moreover, 

P(A) inf p(x - yl) = P(A) mmd(x - ylf -> oo 

as /c — ► oo, satisfying Assumption (iv.2) of. Finally, 

E PP (X-yl)<E P (d(yl)+d(X)Y. 

By Jensen's inequality 

(\d(y* k ) + lrf(X)) r < l -d{ylY + 

and hence 

®pp(X,yl) < 2 r - 1 (d(yiy + E P d(X) r ) < oo 

for all k > 0. Therefore with y = 0, we have Epp(X — y ) < oo and Ep(p(X — y k ); A) < oo, satisfying 
Assumptions (iii) and (iv.l) of Theorem HI Thus applying the Theorem with A k = A yields 

D Pi p( J R, D p ) = oo 

for all < R < oo and D p > D P (R). 

Case 2: r = f. Clearly p and p are continuous, and K P p(X) < oo. Hence Theorem [3] asserts that for 

Dp(oo) < Dp < oo 

D Pi p(oo, D p ) = min ( rjD p + E P sup (p(X, y) - rjp(X, y) 
v>o V ye y 

= min (rjDp + sup p(z) — r]p(z) 



(6) 

f vD n + sup p(z) - 77/0fz) ) , 

r?>0 



and that this quantity is equal to liniR^oo D p ^(R, D p ) whenever it is finite. 
Set 

v* G arg max d(v). 

v£R m :d(v)=l 
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Since d is continuous and {v : d(v) = 1} is compact, at least one such maximizer exists. It is easy to 
check that 

sup p(z) — rjp(z) = sup a r d{v*) r — r\a r = sup a r (d(v*) r — 77), (7) 

zGM" 1 a>0 a>0 

where we have used r = f. In other words, the maximizing z is of the form av* for some a > 0. If 
7] < d(v*) r then 

sup a r (d(v*) r — rf) = lim a r (d(v*) r — rf) = 00. 
a>0 a ^°° 

On the other hand, if r] > d(v*) r , then 

sup a r {d{v*) r -r]) = lima r {d{v*) r — 77) = 0. 

a>0 

Therefore the minimizing 77 > in © is equal to d(v*) r and 



lim D P:P (R,D p ) = D p d(v*) r . 

R— »oo 



Case 3: r > f. Recall that by © 



sup sup p(z) — rjp(z) = a r d(v*) r — r]a r 

a>0 z£R m 



The optimal a* > maximizing this quantity is 

l/(r-f) 

which by Theorem [3] implies that for D p (oo) < D p < 00 



a* = ( —d(v* 



D P! p(oo,D p ) = minrjDp + r] r/(r r) b = mm g(r]), 

rj>0 rj>0 

where 



The rj* minimizing g is 

/f f \ (f—r)/r 

"* = (— c ' 

which finally yields 

For d = d, r = 2, f=l, this reduces to 

lim D PiP (i?, 7J p ) = 

.R— >oc 



6 X (r-f)/r 







Theorem |4] characterizes the behavior of D p p (R, D p ) for (R,D p ) such that i? > R p (D p ). The next 
theorem characterizes the behavior of D P P (R, D p ) for (R,D P ) such that R = R P (D P ). 

Theorem 5. Let the distortion measure p be continuous, and D p > 0. If there exist compact sets C 
X, Mk C y such that P(Kk) — > 1 as k — > 00 anrf 

inf p(x, y) — > 00 (8) 
as k ^ 00, then D P)P (R p (D p ), D p ) > 0, z.e., f/ze set over which we optimize in ^ is non-empty. 
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If, in addition, D P p(R p (D p ) + r, D p ) < oo for some r > 0, p is continuous, and there exists a > 1 and 
c > such that p a < c + p, then 

D prp (R p (D p )+,D p ) = D PtP (R p (D p ),D p ). 

Remark. Condition ® is satisfied for example for p such that p(x,y) — > oo as \\y — x|| 2 — > oo. Indeed, 

for K k = [-k, k] m and M k = [-2k, 2k} m , 

lim P{K k ) = 1, 

k— >oo 

and 



inf p(x,y) > inf p(x,y) -> oo 



as A; — >• oo. 



Example 3. Given a class of distortion measures T, the following approach is suggested in [12] to find 
the "closest" one to p implemented by the human visual system: Determine D PtP (R, D p (R)) for each 
p E T and pick a minimizer p*. In situations where a unique distribution Q with Q x = P achieving 
D p (R) exists, D PtP (R, D p (R)) can be found empirically by generating samples from Q and having them 
evaluated by human subjects. The hope is that the distortion measure minimizing D PtP (R, D P (R)) should 
be a good approximation to p also for non-optimal image compression schemes. Formally, this amounts 
to assuming that D p>p (R+r, D p (R)) is close to D p>p (R, D p (R)) (at least for small r). Hence this approach 
is only valid, if D pp (R + r, D P (R)) is right continuous in r at r = 0. 

Theorem \5\ gives conditions under which this is indeed the case. In [12], X = y = M™, and each p G T 
is of the form 

p(x,y) = H(Kxi), . . .,v(x m )] - [v(yi), . . .,v(y m )])W\\ 



for some monotonic increasing concave function v : M + — > R and some matrix W E ]R mxm . In order 
to apply Theorem [51 we need the additional assumptions that v is continuous at 0, that v(s) — > oo 
as s — >• oo, that W T W is positive definite, and that p implemented by the human visual system is 
continuous and bounded. From Theorem [5j we obtain that under these conditions — implicitly made 
in [12] — D P P (R + r, D p (R)) is indeed right continuous at r = 0, showing that p* should yield a good 
approximation to p also for compression schemes that are only close to optimal. 

We consider the problem of finding an optimal p E T approximating a given p in more detail in 
Section lIFDl 

C. Computing D p p -(R, D p ) 
Define 

R p , p (D p ,D p )±mfI(Q), 

where the infimum is taken over all Q E V(X x y) such that Q x = P, Egp < D p and E Q p > D p . 
Setting 

Sx 4 {(R,D p ,D p ) : D p < D p j(R,D p )}, 
S 2 4 { (R, D p , D p ):R> R p ,~ p {D p , D p ) } , 

it is easy to show that the closures of <Si and S 2 are identical. It is convenient in the following to analyze 
Rp, p (D p , Dp) instead of Dp~ p {R,D p ). 
Define 



Qi{D p , Dp) ^{QE V(X xy):Q x = P,E Q p < D p ,E Q p > Dp}, 
Q 2 (D p ,Dp) ±{Qe Qi(D p ,Dp) : Q < A MmxRm }, 



1 1 



where A K m xM m is Lebesgue measure on M m x M. m . Note that if Q <C A^xr™}* i.e., Q is absolutely 
continuous with respect to Lebesgue measure, then Q admits a density. 

The next theorem gives conditions under which we can restrict the minimization in the definition 
of R PtP (D p , Dp) to distributions admitting a density. We then use this result to find tighter bonds on 
R PjP (D p , Dp) for the important class of difference distortion measures. 

Theorem 6. If 

(i) p, p are continuous 

(ii) there exists a > 0, c > 0, e > such that for all (x, y) E A = {(x, y) : p(x, y) > a} 

sup p(x, y + z)< cp(x, y) 

(iii) P < A K m 
then for all 6 > 

inf I(Q) < inf I(Q) < inf I(Q). 

QGQ 2 (D p +S,Dp-5) QGSi(D p ,Dp) Q&Q.2{D p ,Dp) 

If in addition, 

(iv) infQ g g 2 p pi i>) I(Q) is continuous at (D p ,Dp) (as a function of (D p , Dp)) 
then 

inf I{Q) = inf 7(g). 

QeQi(D p ,Dp) Q&Q2(D p ,Dp) 

We say that p and p are difference distortion measures if p(x, y) and y) are functions of y — x. 
With some abuse of notation we shall write p(y — x) and p(y — x) in this case. The next theorem provides 
a lower bound on R PiP -(D p , D p ), similar to the Shannon lower bound for R p (D p ). 

Theorem 7. Let p, p be difference distortion measures, and let P Ar™ have finite differential entropy. 
If there exist r], fj > 0, and a, such that f : R m — > M + defined by 

f(z) = exp ( - a - rjp(z) + fjp(z)) 

satisfies 

J f(z)dz = 1, 
J p{z)f{z)dz = Dp, 
J p(z)f(z)dz = Dp, 



then 

inf I(Q) > max {0,h(X) - h(Z)} = max {0,h(X) -a -r)D p + fjD p }, (9) 

Q&Q.2(D p ,Dp) 

where X ~ P and Z has density f. If, in addition, there exists a random variable Y independent of Z 
such that X = Y + Z, then we have equality in ©. 

Example 4. Let X = y = M 2 , 

t A 



p(x,y) = {y-x) n A{y-x 



1 

p(x, y) = (y - x) T ( q b )(y-x), 
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with a > b > 0, and let X be Gaussian with mean and covariance matrix i". The asymptotic expression 
(and upper bound) given by Theorem [3] is 

D p>p {oo, D p ) = aD p (10) 

and on the boundary 

D p , p (R p (D p ),D p ) = ^(a + b)D p . (11) 

We now apply Theorems [6] and [7J to compute D p p (R, D p ) for intermediate values of R. The density of 
Z from Theorem [7J is given by 

f(z) = exp{a-z T (v- o aT1 ^ 

Let 

a 2 4 1/(2^ -2afj), 

and note that < a 2 < a 2 . With this, / is a Gaussian density, with two independent components with 
mean zero and variances a 2 and a 2 . For the bound on v^q&q 2 {d p ,Dp) HQ) given by Theorem |7J to be 
tight, we need to show that X = Y + Z for some independent random variable Y. This is the case if we 
need a 2 < 1 (and hence also a 2 < 1). 
In terms of a 2 and a 2 , we have 

Ep(Z) = a 2 + a 2 , 
Ep(Z) = aa 2 + ba 2 , 

h(X)-h(Z) = - 1 -\og(a 2 )- 1 -\og(a 2 ). 

A short computation reveals that for 

a 2 = l - ( 1 + v/l-exp(-2r)) D p , 
° 2 = \{ l ~ v / l-exp(-2r)) J D p , 

we have 

Ep(Z) = 

Ep(Z) = - ((a + 6) + v / l-exp(-2r)(a - 6))D P , 
h(X) - h{Z) = R p {D p ) + r. 
Thus, by Theorems [6] and |7J 

D^(R p (D p )+r,D p ) < l -{{a + b) + ^l- exp(-2r)(a - 6))D P . 

And for 

< ^ < 2/(1 + y/l -exp(-2r)), 
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Fig. 3. D Pt p(R,D p ) from Example [4] with a = 2 and b = 0.5. 



we have a 2 < 1 and hence this bound is tight. In particular, this is the case for < D p < 1. As a quick 
sanity check, we see that indeed 

lim D PtP (R p (D p ) + r, D p ) = \{a + b)D p , 
lim D Pi/i (R p ( J D p ) + r, £> p ) = a£ p , 

r— >oo 

which are the values found in (flOl) and (fTTT) . 

For < Dp < 1, the ratio between the limiting expression as r — > oo and the value for finite r is 
independent of D p and given by 

D p>p ~(R p (D p ) + r, D p )/D p>p (oo, D p ) = ((a + b) + y/l - exp(-2r)(a - 6)) /2a. 

We see that this converges to one quickly as r — > oo, as is shown in Figure |H Hence in this case the 
limiting expression found in Theorem [3] is approached rapidly, and is hence a fairly tight upper bound on 

Dpp(Rp(-D p ) + r, D p ) even for small values of r. 





D. Choosing a "Representative " of a Class of Distortion Measures 

Let T and T denote classes of distortion measures. In this section, we consider the question of how a 
good "representative" p E T of T can be chosen (in a sense to be made precise). 

Consider again the oracle producing source codes as mentioned in the introduction, but assume this 
time that when queried, we can also supply the oracle with a distortion measure p E T. The oracle then 
produces a source code /„ such that 

-log|/„(*")|<i2 
n 

Ep n (X n J n (X n ))<D p (R) + A p . 

Knowing the set of all {A p }p er , and given a T, how should we choose p E T to query the oracle with 
such that f n will "work well" for all p E f ? 
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Fig. 4. D Pt p(Rp(D p ) + r, Dp) /D Pt p(<x, Dp) from Example[4]as a function of r with a — 2, b = 0.5, for all values < D p < 1, Note 
that for an excess rate of r — 0.5, we are already at over 90% of the limiting value, at excess rate of r — 1, we are at over 97% of the 
limiting value. 

This problem has the following operational significance. Assume we have a collection T of tractable 
distortion measures (i.e., distortion measures for which we are able to design good source codes). Assume 
furthermore, we know that the true distortion measure lies in some class T. We can choose a source code 
designed for one of the tractable distortion measures in T. We are then using this source code with 
respect to any of the distortion measures in T. While in the previous sections we were only analyzing 
the performance guarantees under mismatched distortion measures, here we also get to choose p E T in 
order to minimize the mismatch. 

The parameters {A p } pgr allow to account for the difficulty of constructing a source code for distortion 
measure p (see also Example |5] below). Note, however, that there are several reasonable ways in which 
"work well" in the last paragraph can be defined. We will consider two such definitions in the following. 
For rate R, define 

D T f (R, {A p }) 4 inf sup D PtP (R, D p {R) + A p ), 
peV per 

A T f (R, {A p }) 4 i n f SU p (D p> ~ p (R, D P (R) + A p ) - D P (R)). 

per per 

We assume throughout that the {A p } pe r satisfy 

inf A p > 0. 

per 

The next example illustrates why introducing {A p } per is necessary. 

Example 5. Fix distortion measures p, p, and let T = {ap] a >i- All distortion measures in Y are equivalent 
(in the sense that constructing source codes for p is as difficult as constructing source codes for any ap). 
So we should have that all ap represent p equally well (in the sense that for appropriately chosen D ap , 
D ap p(R, D ap ) is the same for all a > 1). As we will see in a moment, this imposes the introduction of 
the quantity {A ap } >i. 
For any fixed D p , we have 

D aPiP ( J R, J D p ) = D PiP ( J R, J D p /a) 
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which goes either to (if R > R p (0)) or to — oo as a — > oo. This shows that we should look at source codes 
constructed with distortion level relative to D ap (R) Assume then we try to minimize D app (R, D ap (R) + A) 
for some fixed A > 0. We have 

D ap>p (R, D ap {R) + A) = D PtP (R, D P {R) + A/a). 

Thus, again, the minimum is achieved as a — > oo, irrespective of the choice of p. This shows, that we 
should not choose A ap as a constant. The natural choice in this example is A ap = aA, for which 

D ap ,p(#, D ap (R) + A ap ) = D p>p (R, D P (R) + A), 

as expected. 

The following two corollaries of Theorem Q] and [2l respectively, establish the operational meaning of 
D Ttf (R,{A p }) and A rf (R : {A p }). 

Corollary 8. Let T, T be classes of distortion measures such that for all p G T there exists ayo = yo(p) G y 
satisfying E P p(X, y ) < oo. For every p G T, R > 0, and .Dp, Ap such that 

<D f < lim D r F (R, {A p - 5}) , 
< A F < lim A r F (R, {A p - 5}) , 

a) there exists p G V and sequences of source codes {f n }n>i such that 

lim - log \f n (X n )\ <R, 

n— >oo fl 

limsupE Pn (X n , f n (X n )) < D P (R) + A p , 

n— >oo 

UminfEp n (X n ,/ n (X n ))>I^. 

n— >oo 

b) z7iere ex/sts p G T and sequences of source codes {f n }n>i such that 

lim - \og\f n (X n )\ <R, 
limsupEp n (X", f n {X n )) < D p (R) + A p , 

n— >oo 

liminf (Ep n (X n , f n (X n )) - D p (R)) > Ap. 



Corollary 9. a) For every 5 > there exists p G T smc/? f/?af zj / n : X n — > 3^ n satisfies 

-\og\f n (X n )\=R, 
n 

Ep n (X n J n (X n ))<D p (R) + A p , 



then 

supEp n (X n ,f n (X n )) < D rf (R,{A p })+5. 
per 

b) For every 5 > f/zere ex/sts p G T swc/z ?/W z/" /„ : X n — > satisfies 

-\og\f n (X n )\=R, 
n 

E Pn (X n J n (X n ))<D p (R) + A p , 

then 

sup (Ep n (X n ,f n (X n )) - D p (R)) < A rf (R,{A p })+5. 

per 
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Corollaries [8] and [9] allow us to make guarantees about the performance of a source codes constructed 
with respect to the best "representative" p E T of T. Indeed, by Corollary |9l there exists p E T such 
that if f n : X n — > y n is a source code of rate R designed for distortion measure p and distortion 
level D P (R) + A p , then f n is also a source code for any distortion measure p E T and distortion level 
D r p(i?, {A p }) + 5. Moreover, this is essentially the best guarantee one can make, since by Corollary [8] 
there exist source codes with same blocklength n and same rate R designed for any distortion measure 
p E T and distortion level D P (R) + A p that result in a distortion level of more than 

D rf (R-5(n),{A p -~5(n)})-5(n) 
for some distortion measure p G T, and with 5(n), 5(n) — > as n — > oo. 
Example 6. Let f = {p}, and 

T 4 { p (a5, j/) = W£C ||y - a:||* : w G W C {X R+)}, 

Let P G 7 3 (A' x be such that E P u»x H-XII2 < 00 f° r au w ^ VV. In [13], the authors show how vector 
quantizers can be relatively easily constructed for distortion measures in the class V defined here. Given 
a more sophisticated distortion measure p, it is thus of interest to find the "closest" p G V to p. In other 
words, for some <5 > 0, we want to find a p E T such that 

D P , P (R, D P (R) + A p ) < D r>f (R) + 5. 

Computing D r p(i?, {A p }) could be done numerically; to obtain some insight we will instead minimize 
D P p(oo, D p (R) + A p ). This will lead to an upper bound on D r p(i?, {A p }), and thus allows us to make 
performance guarantees. Moreover, as we have seen in Example H this bound can be quite good even for 
finite values of R. To be specific, let p(x, y) = (y — x) T W x (y — x) for W x positive definite P almost 
everywhere. Let w p E W be the weight function corresponding to distortion measure p E T. Then from 
Example [Q 

D PiP -(oo, D P (R) + A p ) 

= (D p (R) + Ap) min {q : W x - i]w p J < P a.e} 
= (D p (R) + A p ) ess sup \i(W x ) / w p x , 

where \\{W X ) is the largest eigenvalue of W x , and where the essential supremum is with respect to P. 
Hence _ x 

D rf (R,{A p }) < inf ((Dp(R) + A p ) ess sup X 1 (W x )/wA. 

per \ xeX j 

In other words, the optimal "representative" p G T of p finds the best tradeoff between the difficulty of 
constructing source codes for p (captured by the term D P (R) + A p ) and the closeness to p (captured by 
the term ess sup Ai (W^/u^). 

In the last example, we have taken a sophisticated distortion measure p and found a good tractable 
approximation in Y for it. This approach poses the following question. Even if p is a very good model 
for (say) the human visual system, it will certainly be different from it. In this situation, it is not clear 
if minimizing D P p(_R, D P (R) + A p ) is meaningful. Indeed, if p* is the distortion measure implemented 
by the human visual system, we should really be minimizing D PtP *(R, D P (R) + A p ) instead. The next 
theorem provides conditions under which D p ^ p (R, D p (R) + A p ) and D P:P *(R, D p (R) + A p ) are close and 
hence the approach of Example [6] is reasonable. 

Proposition 10. Let pi,p2,P3 be continuous distortion measures. Then 

D puP3 {R,D) < D puP2 (R,D)+E P (supp 3 (X,y)-p 2 (X,y)) 

y&y 
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and 

D pi>P3 (R,D) > D puP2 (R,D) - E P sup\p 3 (X,y) - p 2 (X,y)\. 

yey 

Example 7. Setting p 1 = p 2 , Proposition [lOl shows that 

\D p2jP3 (R,D pi )-D P2 (R)\ < E P swp\p 3 (X,y)-p 2 (X,y)\. 

Thus if 

E P sup \p 3 (X,y) - p 2 (X,y)\ 

y&y 

is small then the distortion measures p 2 and p 3 are almost equivalent (from the point of source coding). 
Moreover, if p 3 is the actual distortion measure (implemented, e.g., by the human visual system), and p 2 
is a sophisticated model for it (e.g., p 2 (x, y) = (y — x) T W x (y — x) as in Example©, then small 

Epsup \p 3 (X,y) -p 2 (X,y)\ 
yey 

guarantees that minimizing D PltP2 (R, D Pl + A Pl ) over all p\ G T (as is done in Example© is essentially 
equivalent to minimizing D PljPs (R, D Pl + A Pl ). Hence, when constructing a model p 2 for the distortion 
measure p 3 implemented by the human visual system, it is reasonable to choose the model parameters 
such that 

Epsup \p 3 (X,y) -p 2 (X,y)\ 
y&y 

is minimized. 

III. Proofs 

A. Proof of Theorem [7] 

A slight modification of Lemma 9.3.1 and the first part of the proof of Theorem 9.6.2 in [14] show 
that for every 5 > there exists a sequence of source codes {f n }n>i such that 

lim P n {A n ) = 0, 

n— >oo 

1 _ (12) 

lim -\og\f n (X n )\<R, 

where 

A n 4 { x n : p n (x n , f n {x n )) >D p - 5/2} U {x n : p n (x n , f n (x n )) < D p }. 



Let 
and set 



B n 4 {x n : p n (x n , f n (x n )) >D p - 5/2} C A n , 
A \y if x n G B n , 



fn(x r 



f n (x n ) else, 



where y ± (y , . . . ,y ) G y n . We have \f n (X n )\ < \f n (X n )\ + 1 and hence by <[T2 



Moreover, 



lim - log !/"(*")! = lim - \og\f n (X n ) \ < R. 

n— >oo fl n^oc n 



Ep n (X n , f n (X n )) <D p - 5/2 + E(p n (X n , f n {X n ))- B n ), 
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and for any b > 

E(p n (X n J n (X n ));B n ) 
1 n 



n 

i=l 



1 n 

<-J2 ( E 0°(^ fo); 2/o) < &} n fin) + E(p(X,, y ); {p(X,, j/ ) > &}) 



n 
i=i 



< 6P"(A n ) + E P (p(X, j/ ); y ) > 6})- 

Since Epp(X, y ) < oo, there exists 6 > such that E P [p(X,y ); {p(X 1 y ) > b}) < 5/2. Hence 
using (fT2l) . 

\imsupEp n (X n J n (X n ))<D p . 



Finally, 



and hence by (PT2l) 



5. Proof of Theorem [2] 
Let p' = -p. If 



Ep(X\ f n (X n )) > E(p(X n , f n (X n )); A c n ) 



\\mmmp n {X n ,f n {X n ))>D~ p . 



-\og\f n (X n )\=R, 

n 



then we also have 



Ep n (X n J n (X n ))<D p 
Ep n (X n J n (X n ))>D p 



Ep' n (X n ,f n (X n ))<-D p ±D p , 

By [5, Theorem l.b], for every 5 > there exists Q E V[X x y) such that Qx = P and 

HQ) <R + S, 
E Q p < D p , 
E Q ~p' < Dpi. 

Therefore 

D p <E Q p< D PjP (R + 5,D p ), 

and maximizing over the choice of D p yields the first part of the theorem. 

For the second part, we need to show that D PtP (-,D p ) is continuous for R > R P (D P ). We first show 
that D P:P (-,D P ) is concave. Fix 6 > 0. Let Q\,Qi E V(X x y), both with X marginal P, and such that 

I{Qi) < R h E Qi p < D p , and E Qi p > D P:P (R U D p ) - § for % E {1, 2}. Setting Q = aQ 1 + (1 - a)Q 2 , we 
have Eqp < D p and 

E Q p = aE Ql p + (1 - «)E Q2 p 

> aD prp {R u D p ) + (1 - a)D Pi p( J R 2 , D p ) - 5. 
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Since mutual information is convex in the conditional distribution [15, Corollary 5.5.5], 

I(Q) < aJ(Qi) + (1 - a)I(Q 2 ) 
< aRi + (1 - a)R 2 . 

Hence 

V p ,p{aR x + (1 - a)R 2 , D p ) > aD p , p {R u D p ) + (1 - a)D prp (R 2 , D p ) - 5. 

Since 8 > is arbitrary, this proves concavity of D PiP (-,D p ). Moreover D P:P (-,D P ) is increasing, and 
therefore this implies that it is right-continuous except for possibly at the point R p (D p ). From this, the 
result follows. 

C. Proof of Theorem \3\ 
We first show that 

lim D PjP (R,D p ) = D Pt p(oo,Dp). 

R^oo 

D Pi p(oo, Dp) < oo by Assumption (iv), and therefore there exists Q E V(X x y) such that Qx = P, 
IEqP < D p , and Eq/5 > D P p(oo, D p ) — e. Let Ki C X x y be compact and such that Q(Kj) > 1 — 1/i 
for all i > 1. Thus Q(Uj>iiQ) = 1, and therefore by dominated convergence (using Assumption (iv) for 
the first line and Assumption (ii) for the second line) 

lim E Q (p; U I z=1 K l )=E Q p, 

1— >oo 

lim E P (p(X,y );(U I l=1 Ki) c ) = 0. 

Hence there exists a compact K c X x y such that 

E Q (p;K)>E Q p-e (13) 
E P {p{X )yo )-K c ) <e. (14) 

Since p and p are continuous by Assumption (i), they are uniformly continuous on the compact set K. 
Hence there exists 8 > such that 

\p(x,y) - p(x,y)\ < e 
\p(x,y) - p(x,y)\ < e, 

whenever \\x — x\\ + \\y — y\\ < 5. Now, since K is compact, there exists some finite L and {a^, ye}f =1 C K 
and a finite measurable partition {Ai]^ =l of K such that ||x — + \\y — ye\\ < 5 for all (x, y) E At and 
for all £e{1,...,L}. 
Define 

if(x,y)eif c , 

if (X,Y)EA £ . 

Since A' and {Ai}f =1 are measurable, Y is a random variable. Let Q be the distribution of (X, Y) when 
(X, Y) ~ Q. Since Y takes on at most L + 1 values, we have 

/(Q) < log(L + 1) < cx). 
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Moreover, 

L 
L 

= ®p(p(X, vM) + ®p(p(x, 2/0); K c ) 
e=i 

L 

<J2®Q{p(X,Y)+e;A e )+e 

< E Q p + 2e 
<D p + 2e, 

where the first inequality follows from the uniform continuity of p on K, and from (fl4l) . And 

L 

1=1 

L 

= J2 E p(p(X,y e )-A e ) 



L 



>J2^Q(p(X,Y)-e;A £ 



= E Q (p; K)-e 

> E Q p - 2e 

> D Pi p(oo,D p ) -3e, 

where the second inequality follows from the uniform continuity of p on K, and the third inequality 
form (fT3l . Therefore 

D Pi/5 (oo, D p ) < lim D Pt p(R, D p + 2e) + 3e 
< D A p(oo, D p + 2e) + 3e. 

Since D p )P -(oo, •) is concave, it is continuous at D p > D p (oo) (Assumption (iii)). Hence taking the limit 
as e — > yields 

lim D pp (R, D p ) = D p>p (oo, D p ). 



R^oo 



We now show that 



is well defined. Let 77 > 0, 



E P sap(p(X,y)-rjp(X,y)) (15) 



f(x,y) - p{x,y) - r) P (x,y), 

g(x) = sup f(x,y). 
y&y 

By Assumption (i), / is continuous, and hence 

sup f(x,y) = sup f(x,y). 



As Q m is countable, this last supremum is measurable, and hence g is a measurable function. Moreover, 

g(x) > f(x,y ) > -r]p(x,y ), 
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and hence by Assumption (ii) 

E P g~ < oo, 

where g~ = max{0, —g} is the negative part of g. Thus the expectation Epg in (fl~5T) is well defined. 
We next show that 

D Pi p(oo, Dp) < m.m.7]D p + E P g. (16) 

r;>0 

Consider 

D Pi(5 (oo, D p ) = sup E Q p. 

QeV(Xxyy. 

Qx=P,^QP<D p 

The right hand side is linear in Q with linear constraints. Since D p > D p (oo) by Assumption (iii), a 
strictly feasibly point exists. Hence, we obtain by strong duality (see, e.g., [16, Theorem 8.6.1]) 

D Pij5 (oo, Dp) = mmr]D p + sup E Q (p - rjp) 

v>o QeV(Xxy):Q x =P 

< mmrjDp + Epg. 

rj>0 

As the last step, we show that we have equality in (fT6l) . To this end, we have to construct a Q E V{X x y) 
such that Qx = P and Eq/ is arbitrarily close to Epg. Given any positive simple function < s < g, 
i.e., s = J2j =1 Pj^Bj for finite measurable partition {Bj}j =1 of X. Let Cj C Bj be compact and such that 
P{Cj) > P{Bj)-e/J for all j E {1, . . . , J}. Since the {Bj} J j=l are disjoint, we have P(U/ =1 C j ) > 1 -e. 
For each x E Cj and any 5 > 0, there exists a y(x) such that 

f(x, y(x)) > g(x) -5 > s(x) - 5. 

By continuity of / and since s is constant on Bj, for each x E Cj D Bj there exists a open neighborhood 
Gj(x) of x such that 

f(x,y(x)) > s(x) -25 = s(x) - 25 

for every x E Gj(x). Since Cj C U xg ^ Gj(^)> and since Cj is compact, there exists a finite subcover, say 
{Gj(x)} xe c for some finite set Cj C Cj. Construct a finite measurable partition {i£fc}|Li of Lij =1 Cj such 

that for each fc we have C Gj(x) fl -Bj for some j and some x E Cj. Call x fe the a; G Cj corresponding 
to E k . 
Define 

' y if X e (U/ =1 C,0 
if X G B fe . 

Since each is measurable, this is a random variable. Let Q be the distribution of (X, Y). We have 

K 

E Q f = Eg(/; + e q (/(x, 30; (u/=iC0 c ) 

fe=l 

> J]Ep(/(X,y(x fc )); E k ) - r)E P (p(X, y ); (u/ =1 C0 c ) 
k=i 

K 

> ^E P ( S (X) - 25- E k ) - rjE P (p(X, y ); (u/ =1 C0 c ) 
fe=i 

= Eps(X) - E P (r,p(X, y ) + s(X); (u/ =1 C0 c ) - 25. 
Recall that P(u/ =1 C i ) > 1 - e. Since 

< E P (rip(X, yo) + s(X)) < r/E P p(X, y ) + max $ < oo 

jG{l,...,J} 



Y = 
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by Assumption (ii), we can choose e small enough such that 



E P (rip(X,y ) + s(X); (u/ =1 C,) c ) < 5. 



With this 

Eg/ > E P s - 35. (17) 

Since g is a measurable function, we can choose simple functions Si < g such that lim^oo Eps, = Epg. 
In light of (fT71) . this implies that 

sup Eq/ = E P g, 
Qev(Xxyy.Q x =p 

concluding the proof. 
D. Proof of Theorem |4] 

D p (-) is convex [15, Lemma 10.6.1] and hence continuous except for possibly at the boundary. By 
Assumption (iii), D p (0) < oo, and by Assumption (i), R > 0. Thus D p (-) is continuous at R. Therefore, 
since D p > D p (R) by Assumption (ii), and since < R < oo by Assumption (i), there exists e > such 
that D p - 2e > D p (R - e). Hence by the definition of D p (R), there exists Q G V{X x y) such that 
Qx = P, HQ) <R-eand E Q p <D p -e. 
Let g k : X — > y be defined by 

a I yj!: if x e A fe , 

2/o else. 



Set F fc = g k (X) and let W fc be the distribution of (X,Y k ). Set Q k = (1 - a)Q + for some 
a G [0, 1]. Clearly both W k and Qfc have marginal P. Mutual information is convex in the conditional 
distribution [15, Corollary 5.5.5], and thus 

I(Qk) < (1 - <*)I(Q) + aJ(W fc ) < /(Q) + a/(W fc ). 
We have /(W^) < log(2) < 1, and hence for a < e 

I(Q k )<I(Q)+s<R. (18) 

Moreover, by Assumption (iii) 

Eg k p < E Q p + aE Wk p 

<D p -s + a {E P (p(X, y );A c k ) + E P (p(X, y* k ); A k )) 
<D p -s + a(D + E P (p(X, y* k ) ; A k )) . 



Setting 



this becomes 



A 

a = 



l + D + E P (p(X,yl);A k ) 



E 5 o < D p . (19) 



Note that a < e as needed in (fT8l . and a > since Ep(p(X, y k ); A k ) < oo by Assumption (iv). 
Finally 

%P > aE Wk p 

>aE P (p(X,y* k );A k ) 
^ E P (p(X, y * k );A k ) 



D + Ep(p(X,y* k );A k y 
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and 



E P (p(X,yt);A k )=E P [^^p(X,yiy,A } 



<E P (p(X,y* k );A k )[ sup ||||| 



Hence by Assumption (iv), 



1 + D p(x, y£ 



E P (p(X,^);A fc ) xeAk p(x,y* k ] 



-i 



> £ (p(A fc )inf xeAfc p(x,y*) + ~" °° (20) 

as k — > oo. 

Combining (fT8l) . (fl9b . and d20|) . we get D Pij5 (i?, D p ) = oo. 
£. Proof of Theorem \5\ 

D p> p(R p (D p ),D p ) > if and only if the set of all Q E V{X x y) such that Q x = P, I(Q) < R P (D P ), 
EqP < D p is non empty. By definition R p (D p ) = inf I(Q), where the infimum is taken over all Q E 
V[X x y) such that Q x = P and E Q p < D p . Hence D PtP (R p (D p ), D p ) > if and only if this last infimum 
is attained (i.e., a minimizing Q exists). By Theorem 2.2 (and the remark following its proof) in [17], 
this is the case when p is continuous, D p > 0, and the set of all Q over which the infimum is taken is 
tighli 

From this, we only have to show tightness to prove the first part of the theorem. E Q p < D p implies 
that 

D p > E Q p > Q(K k x M c k ) inf p(x,y), 

xeK k ,yeM% 

and thus 

Q(K k x M k ) = P(K k ) - Q(K k x M c k ) 

>P(K k )-D p / inf p(x,y)^l 

as k — > oo. Since the sets K k x M k are compact, this shows tightness and proves the first part of the 
theorem. 

The proof of the second part adapts an argument from [17, Theorem 2.2]. Note that since D p p (R p (D p ) + 
r, Dp) < oo for some r > (and hence, by concavity, for all r > 0), for every e > and all i > 1 there 
exists Qi E V(X x y) with X marginal P such that 

l{Qi) < R P (Dp) + l/i, 

Eq,P < Dp, 

E Qi p > D p ^ p (R p (D p ) + 1/ i, D p ) - s. 

Since the set of all feasible distributions is tight as shown above, this implies that {Qi]i>\ contains a 
weakly convergent subsequence^, and we may assume without loss of generality that Qi Q for some 
Q E V{X x y). Using exactly the same argument as in [17, Theorem 2.2], we have 

Rp(Dp) > liminf I(Q { ) > I(Q) (21) 

i— >oo 

3 The set of distributions Q C V(X x y) is tight if there exists compact sets Ak C X x y such that suPq 6Q Q(A%) — > as k — > oo. 

A Qi converges weakly to Q (denoted by Qi =?■ Q) if lim^oo Eq^ = Egg for all bounded and continuous functions g £ X x y — > K. 
An equivalent definition for Qi Q is that liminfi^oo Qi{A) > Q(A) for all open sets A G X x y (see [18, Theorem 2.1]). If Zi ~ Qi 
and Z ~ Q, we write =>• Z if Q 4 Q. 
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and 

D p > liminf Eg. p > Egp. (22) 

Finally, since p is continuous, we have p(X, F) =>- p(X, F), where (X, Y,j) ~ Qi and (X, Y) ~ Q. As 

sup Ep(X, Yi) a < c + sup Ep(X, Yi)<c + D p < oo, 

i>l j>l 

{p(X, Yi)}i>i is uniformly integrable. Therefore by [18, Theorem 3.5] 

Urn D p , p (R p (£> p ) + D p ) - e < Urn Ep(X, F) = Ep(X, F) = E p. (23) 

Since e > is arbitrary, d2T]), ([22]), and ([23]), imply that 

D PiP -(R p (D p ),D p ) > D p , p (R p (L> p )+,D p ). 
As D PP (-,D P ) is increasing, we also have 

D PtP (R p (D p ),D p ) < D PtP (R p (D p )+,D p ), 
concluding the proof of the second part of the theorem. 

F. Proof of Theorem |6| 

Since Q,2(D P , D p ) C Qi(D p , D p ), it is enough to show that for every 5 > 

inf I(Q) > inf I(Q). 

Q&Qi{D p ,D~ p ) Q€Q 2 (D p +6,Dp~S) 

For some v > 0, choose g € Q\{D P , Dp) such that 

/(g) < inf /(g) + 1/. 

QeQi(D p j3 p ) 

Fix e > and let Z be uniformly distributed on (— e, e) m and independent of X, F. Define F = F + Z 
and let Q £ be the distribution of (X, F) when (X, F) ~ Q. Note that by Assumption (iii), Q £ -C AiRm XIR „i 
whenever e > and that g = g. By the data processing inequality 

I{Qs) < HQ) < inf I{Q) + v. (24) 

QeSi(D p ,Dp) 

We now show that Q £ =>- Q as e — ► (i.e., that g e converges weakly^ to g). For this, it suffices to 
show that for every open G C A? x y we have liminf e ^ Qe(G) > Q(G). Define 

G e 4 {( X; y ) G * x y : (x, y + z) G G Vz G M m with ((z^ < e}. 

Since (X, F) G G £ implies (X, F) G G, we have Q e (G) > Q(G £ ). Since G is open, we have 1^ — > 1^ 
point wise as e — * 0, and hence by Fatou's lemma 

liminf QJG) > liminf Q(G e ) > Q(G). 

Thus Q £ =>- Q as s — > 0. 

By continuity of p (Assumption (i)), we get by weak convergence for every b > 

E Qe p > Eq £ min{p, b} -> E Q min{p, 6}, (25) 

as e — > 0. Assuming Egp < oo, choose b such that £q min{p, b} > Egp — 5/2. Then there exists E\ > 
such that for e < e±, we have by ((231) 

Eq e p > Eg min{p, b} — 5/2 
>E Q p-5 

>D~ p -5. (26) 
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Since D p < oo, this last conclusion follows by a similar argument if Eqp = oo. 
Moreover, by Assumption (ii) 

E QE p = E Qe (p; A c )+E Qe (p; A) 

< E Qe (p; A c ) + E Q (sup z;||z||oo < e p(X, Y + z);A) 

<E Qe (p;A c ) + cE Q (p;A). (27) 

Now note that Assumption (ii) holds also as we increase a. Since Egp < D p , we have Eg(p; A) —> as 
a — > oo. Hence there exists a such that Assumption (ii) holds and cE<g(p; A) < 5/2. For this a, we can 
continue (1271) as 

EQ £ p<E Q£ (p;A c ) + 5/2 

< Eq £ min{p, a} + 5/2. 

By continuity of p (Assumption (i)) and weak convergence of Q e , Eq e min{p, a} — > Eg min{p, a} as 
e — >• 0. Hence there exists < e 2 < ^i such that for < e < e 25 we have 

Eq e p < Eq min{p, a} + 5 
< E Q p + 5 

<D p + 5. (28) 
Combining (|24h . (l26b . and (l28)) . we obtain for < e < £ 2 that Q e e Q 2 (D p + 5,D p - 5) and 

I{Q £ ) < inf J(Q) + v. 

QeSi(D p ,Dp) 

Since z/ > is arbitrary, this shows that 

inf I(Q)< inf J(Q), 

QeQ 2 (D P +5,-Dp-5) Q€Qi(D p ,D~ p ) 

proving the first part of the theorem. 

The second part follows directly from continuity of 

inf I{Q). 



G. Proof of Theorem 

Let Q e Q 2 (D p ,D p ) and (X,Y) ~ Q. Then h(X) and /i(X|F) are well defined and since h(X) is 
finite, we have I(Q) = h(X) - h(X\Y). Therefore 

I(Q) = h(X) - h(X\Y) 

= h(X) — h(X — Y\Y) 

> h(X) - h(X - Y) 

> h(X) - sup h{Z), 

ZMp(Z)<D p ,Ep(Z)>Dp 

with equality if there exists Z = X — Y independent of Y and achieving the supremum. By [19, Theorem 
3.2], if there exist a e R, rj, fj G R + such that / : E m -> R + defined by 

f(z) = exp ( - a - rjp(z) + i)p(z)) 
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satisfies 

' f{z)dz = l, 



J p(z)f(z)dz = D p , 
J Pi z )f{z)dz = Dp, 



then / is the density of the maximizing Z, and 

h(Z) = a + r/Dp - fjDp. 

Thus in this case 

inf I(Q) > m&x{0, h(X) - a -riDp + fjDp}. 

Q£Q'2(D p ,Dp) 

H. Proof of Corollary \8\ 

Let e, 5 > be small enough such that 

D f < D rf (R, {Ap-5}) -2e 



and 



For every p e T, we have 



inf A p > 5. (29) 

per 



D f < D r>f (R,{A p -5}) - Is 

< sup D Pj p(R, Dp(R) + A p -5)-2e 
per 

< D P)P -(R, Dp(R) + Ap-S)-e, 

for some p E T. By (T29l , A p — 5 > 0, and hence D Pi/5 (-, D p (_R) + A p — 8) is continuous at i?. Therefore, 
by choosing 5 small enough, we have 

D p ,p-{R, Dp(R) + A p - 5) - e < D p ,p(R - 5, D P (R) + A p - 5). 

Hence for this p, Theorem [T] guarantees the existence of a sequence of source codes {f n } n >i such that 

lim - log \f n {X n )\ <R, 

n— >oo 77, 

limsupE Pn (X n ,/„(X")) < Dp(R) + A p , 

n— >oo 

liminfEp n (X n ,/ n (X n )) > Dp- 

This proves part a of the theorem. 
Part b follows similarly. 
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/. Proof of Corollary [P] 
Choose p E T such that 



sup D p ,p{R, D P (R) + A p ) < D r f (R, {A p }) + 6. 
per 



For any p G T, we have by Theorem [2] 

£p n (X™, / n (X n )) < D PiP (R+, D p (R) + A p ). (30) 
Since A p > 0, D pp (-, D p (i?) + A p ) is continuous at R. Hence 

D PiP (i?+, D P (R) + A p ) = D p> - p (R, D p (R) + A p ) 

< sup D prp {R, D P (R) + A p ) 
per 

<D Tf (R,{A p })+6. 

This proves part a of the theorem. 
Part b follows similarly. 

J. Proof of Proposition UM 



V P l,p 3 { R ,D)= SU P E QP3 

Qev(Xxyy.Q x =p, 

I(Q)<R,E QPl <D 

< D pi,p 2 (^ D)+ sup Eg(p 3 - p 2 ) 

Qev{xxyy.Q x =p 

< D P1 tP2 (R, D) + E P (sup p 3 {X,y)-p2{X,y)). 

yey 

And 

Dpi.PafX^) = SU P E QP3 

Qev(xxyy.Q x =p, 

I(Q)<R,E QPl <D 

> D pi,p 2 ( R i D ) ~ SU P E Q |p 3 -p 2 | 

QeP(A'xy):Q^=P 

> D PliP2 ( J R, D) - E P sup |p 3 (X, y) - p 2 (X, y)\. 

y^y 

IV. Conclusion 

In this paper, we investigated the problem of source coding with mismatched distortion measures. We 
derived a single-letter characterization D p p (R, D p ) of the best distortion level with respect to p that can 
be guaranteed for any source code of rate R designed for distortion level D p with respect to p. We also 
derived a single-letter characterization D P;P (oo, D p ) of the best distortion guarantee independent of the 
rate R of the source code. We then looked at properties of D PtP (R, D p ), characterizing its behavior for 
R > R p (D p ) and on the boundary. We finally considered the problem of choosing a representative p ET 
of p. 
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